feat(r/sedonadb): add CRS printing for sedonadb_dataframe#475
feat(r/sedonadb): add CRS printing for sedonadb_dataframe#475e-kotov wants to merge 19 commits intoapache:mainfrom
Conversation
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Pull request overview
This pull request adds CRS (Coordinate Reference System) printing functionality to the sedonadb_dataframe print method. When printing a dataframe with geometry columns, the CRS information is now displayed below the header, showing the geometry column names along with their CRS identifiers (e.g., "EPSG:5070", "OGC:CRS84").
Key Changes:
- Added Rust function
parse_crs_metadatato extract CRS information from GeoArrow metadata - Enhanced
print.sedonadb_dataframeto display geometry column CRS information with width-aware truncation - Created comprehensive test suite covering various CRS scenarios including EPSG codes, engineering CRS, and edge cases
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| r/sedonadb/src/rust/src/lib.rs | Implements parse_crs_metadata Rust function to parse CRS from GeoArrow JSON metadata |
| r/sedonadb/src/rust/api.h | Adds FFI declaration for the new parse_crs_metadata function |
| r/sedonadb/src/rust/Cargo.toml | Adds serde_json dependency for JSON parsing |
| r/sedonadb/src/init.c | Registers the new parse_crs_metadata C binding |
| r/sedonadb/R/000-wrappers.R | Adds R wrapper for parse_crs_metadata FFI function |
| r/sedonadb/R/crs.R | Introduces sd_parse_crs helper function for parsing CRS metadata |
| r/sedonadb/R/dataframe.R | Enhances print.sedonadb_dataframe to display CRS information for geometry columns with truncation support |
| r/sedonadb/tests/testthat/test-crs.R | Adds comprehensive tests for CRS parsing and display functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you for working on this! I love the column count, CRS, and geometry column information output when printing. I added a few high-level suggestions but with some tweaking we can merge the current approach too. The spirit of all my comments is that I'd love to use some of the places where we've already implemented some of this for Rust or Python or geoarrow/r already.
Something I added to the Python bindings but forgot to add here was a bare-bones wrapper around the SedonaDB/DataFusion schema, which provides access to column/type information including the CRS: https://github.com/apache/sedona-db/blob/e0e1d109480727faaf7be25923b57b4686144438/python/sedonadb/src/schema.rs . I added some suggestions inline about how to draw a few ideas from that hopefully without widening the scope of this PR too much 🙂
|
@paleolimbot I think I addressed all comments, but I might need some help with how to best approach the merge conflict. |
paleolimbot
left a comment
There was a problem hiding this comment.
Apologies for the merge conflicts...I was trying to make it easier to develop the R package but it definitely conflicted with this PR 😬 . You should be able to git pull and use tools/update-savvy.sh and air format with the package now.
This is looking great! A few things we should solve here I think but I love the improved output and I think this is close!
| test_that("sd_parse_crs handles empty string", { | ||
| expect_snapshot( | ||
| sedonadb:::sd_parse_crs(""), | ||
| error = TRUE | ||
| ) | ||
| }) |
There was a problem hiding this comment.
This test feels like it should be renamed (or the behaviour modified such that it handles the empty string)
| out.set_name(0, "authority_code")?; | ||
| out.set_name(1, "srid")?; | ||
| out.set_name(2, "name")?; | ||
| out.set_name(3, "proj_string")?; |
There was a problem hiding this comment.
It might be more appropriate to call this input (which is the term sf uses to describe this concept, sort of), or maybe definition. (A "proj string" carries the connotation specific formatting that is not how this is typically formatted here)
| inner: crs_arc.clone(), | ||
| }) | ||
| } else { | ||
| Err(savvy::Error::new("No CRS available for this geometry type")) |
There was a problem hiding this comment.
The docstring says this returns NULL for the "no crs" case. If this is hard to do with savvy maybe just update the docstring explaining that.
| match self.inner.srid() { | ||
| Ok(Some(srid)) => savvy::Sexp::try_from(srid as i32), | ||
| Ok(None) => Ok(savvy::NullSexp.into()), | ||
| Err(e) => Err(savvy::Error::new(format!("Failed to get SRID: {e}"))), |
There was a problem hiding this comment.
The docstring says this should return NULL for this case?
| /// Get a formatted CRS display string like " (CRS: EPSG:4326)" or empty string | ||
| fn crs_display(&self) -> savvy::Result<savvy::Sexp> { | ||
| use sedona_schema::datatypes::SedonaType; | ||
|
|
||
| match &self.inner { |
There was a problem hiding this comment.
Do we need this one (from R you can do sd_type$crs()$display()?)
There was a problem hiding this comment.
I realised that the whole test suite can be revised with more realistic examples of printing the actual sedonadb_dataframe with CRS, so I will take a bit more time to work on this PR. I converted it to draft for now.
|
|
||
| // Use existing SedonaType infrastructure to parse the field | ||
| let inner = SedonaType::from_storage_field(&field) | ||
| .map_err(|e| savvy::Error::new(format!("Failed to create SedonaType: {e}")))?; |
There was a problem hiding this comment.
| .map_err(|e| savvy::Error::new(format!("Failed to create SedonaType: {e}")))?; | |
| .map_err(|e| savvy_err!("Failed to create SedonaType: {e}"))?; |
(I've been trying to consistently use savvy_err!() elsewhere but I'm new to this so the conventions aren't perfect)
| +-----------------------------+----------------------------+ | ||
| +-----------------------------+----------------------------+ | ||
| Preview of up to 0 row(s) | ||
|
|
There was a problem hiding this comment.
Should we add a snapshot test for printing without any geometry column?
no worries, and I will get back to you on the questions a bit later. Thanks for another thorough review! |
Adds CRS printing to
sedonadb_dataframeprint method and relevant helper in rust that can be reused in other functions (but I kept it unexported for now).